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Abstract: In a multimodal biometric system, an efficient 
fusion method is necessary for combining information from 
various single modality systems. The score level fusion is 
used to combine several biometric features derived from 
different biometric modalities. Three biometric 
characteristics are considered in this study: Face, fingerprint 
and Voice. Classification methods represent also the basis of 
important recognition accuracy improvements. The artificial 
neural networks (ANN) and support vector machines (SVM) 
are considered as an excellent technique for classification. 
This paper presents a comparison of multimodal biometric 
recognition performances based on some methods that have 
been successfully applied using the fusion of scores. After 
exploring each modality (face, fingerprint and voice), we 
recovered three similarity scores. These scores are then 
introduced into two different classifiers: ANN and SVM. 
Experimental results demonstrate that a multimodal 
biometric system provides better performances than those 
using just one modalities system. Comparison of support 
vector machine and ANN based on score-level fusion 
methods is obtained and demonstrates that an average 
recognition rate(ARR=57.69%) is obtained using ANN. 
While fusion based on SVM gives an ARR= 63.31%. 
Keywords: Multimodal biometric system, Voice, Fingerprint, 
Face, Recognition, Score-level, Fusion, ANN, SVM. 

I. Introduction 

Nowadays, there is a strong request for automatic and secure 
identity verification systems. The individual's identification 
becomes an essential task to ensure the safety of systems and 
organizations. Multimodal Biometric identification is a new 
technology to solve this problem. Many biometrics modalities, 
including fingerprint, face and speech have been proposed for 
verification and identification purposes. Several works on 
multimodal biometric systems has already been done in the 
literature. Dieckmann et al. [1] proposed a summary level fusion 
scheme: "2-of-3 approach" that integrates the movement of the 
lips, face, and voice based on the principle that man uses, 
parallel, several indices identify a person. Brunelli Falavian [2] 
proposed a system level measurement to combine the outputs of 
the sub-graders, Kitter et al. [3] and demonstrated the 
effectiveness of an integration strategy that merges multiple 
snapshots of a biometric property initials using a Bayesian 
framework. Bibun et al. [4] proposed a Bayesian integration 
scheme of combining different evidence. Maes et al. [5] 
proposed to combine biometric data (e.g. fingerprint) with not 
biometric data (e.g. passwords). Hong and Jain [6], developed a 


multimodal identification system that incorporates two different 
biometrics (fingerprints and face). 

However, despite significant research, biometric matching 
accuracy remains low. This accuracy problem has recently been 
addressed through multi-modal biometric fusion, which 
combines the matching scores obtained through individual 
biometric classifiers. 

In fact, in this paper, we provide a multimodal biometric system 
respecting several constraints comfort [10] and reliabilities 
(Increase rate recognition calculation inexpensive, robustness). 
The fusion phase allows address the lack of information 
resulting from the use of a single modality. We propose also, an 
adaptive system of recognition of individuals by the merger of 
three biometric modalities: fingerprints, face and voice. Fusion 
was made using a hand machines support vector (SVM), and 
artificial neural networks (ANN) on the other hand. These 
classification methods have greatly enriched the biometric 
recognition methods. 

This article is organized as follows: Sectionll describes the 
unimodal biometric systems. Sectionlll presents the proposed 
multimodal system using respectively ANN and SVM. Section 
IV discusses the experimental results of these approaches. The 
performance of the proposed multimodal approach using ANN is 
analyzed and compared with respect to that of the proposed 
multimodal approach using SVM. The final section presents the 
conclusion and discusses our work perspectives. 

II. Unimodal recognition 

To test our multi-modal fusion technique, we use one classifier 
for each of the following biometric modalities: Fingerprint, face 
and voice. 

A. Fingerprint Recognition 

This method relies on the principle of extracting the minutiae; 
settings relevant characteristic footprint such as Ridge ending: the 
point where the ridge is stopped (Figure 1-a) and Bifurcation: the 
point where the ridge is divided into two (Figure 1-b). 


R-JucmiSuj Bluli^UO-- 

-a- -b- 

Figure 1 : Fingerprint minutiae 
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The preprocessing phase is essential in a system for recognizing 
forms. To improve the quality of the information extracted from 
the images, one can specify regions of interest or enhance the 
contrast of images[5]. To avoid the extraction of false minutiae, 
several pretreatment steps have been performed like: 
Binarization, Skeletonization, (Thinning), Region of Interest, 
Minutiae extraction. 

The overall architecture of a fingerprint recognition system is 
described on figure 2. 
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C. Voice recognition 

This is a transformation of a speech signal into a sequence of 
symbols representative of the signal content. The most 
commonly used extracting algorithms are the Mel frequency 
cepstral coefficients (MFCC) that showed on the following 
figure 4. 



Figure 2: Principle of a fingerprint recognition system 


Fingerprint matching is known to be a relatively accurate 
biometric even with only partial fingerprint data. [32], [33] 

B. Face recognition 

Facial recognition is a task that humans naturally and effortlessly 
perform in their daily lives.lt is one of the basic biometric 
technologies, took a share of more and more important in the 
field of research, this being due to rapid advances in 
technologies such as digital cameras, Internet and mobile 
devices, all associated with security needs constantly increasing. 
Facial recognition has several advantages over other biometric 
technologies. It is an inexpensive used technique, very well 
accepted by the public and requires no action by the user (Non- 
intrusive and no contact). The basic principle of operation of a 
facial recognition system is illustrated by (Figure 3). It can be 
summarized in four stages: detection [3] and standardization [4] 
of the face and the last two blocks represent the recognition 
phase made by a subsequent extraction of features which will be 
compared with others features stored in data base. 

Input detected normalized feature ^ ;1 



Figure 4: Principle of the extraction of MFCC coefficients 

III. THE PROPOSED MULTIMODAL ARCHITECTURE 

Using the unimodal biometric systems based on just a 
unique biometric signature cannot currently guarantee an 
excellent recognition rate. Thus, the error rate associated with 
unimodal biometric systems are relatively high, which makes 
them unacceptable for deployment of safety critical applications. 
To overcome these drawbacks, we proposed a novel multimodal 
architecture based on fusion between the presented biometric 
modalities. This multimodal system needs an effective fusion 
scheme to combine biometric characteristics derived from one or 
more modalities. In fact, we used the fusion method at the score 
level which has a high potential for efficient consolidation of 
multiple unimodal biometric matcher outputs. 

This proposed approach is to merge the output score of three 
different unimodal recognition systems using two types of 
classifiers: ANN and SVM. We chose to compare the 
performance of the ANN merger with those of SVM merger. 


Figure 3: Principle of a facial recognition system 
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Figure 5: Fusion Score level for the multimodal biometric system 
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Neural networks are widely used for classification, process 
control, modeling of static data, modeling of dynamic processes, 
etc. 

The interconnection of neurons forms a network. Neurons are 
arranged in layers, namely: an input layer, an output layer and 
one or more hidden layers between the input layer and the output 
layer. This kind of network is called Network Multilayer (Figure 
10 ). 



To achieve performance close to those observed in humans, the 
classifier based on artificial neural networks (ANN)have been 
used, associated with the fusion of the three modalities biometric 
data collected and processed by the individual classifiers. 
Indeed, using ANN for three separated biometrics, we will 
obtain two different scores which are recovered using a third 
ANN network in order to find the final decision. Figure7 shows 
the global structure of the proposed system. 


A. Fusion with ANN 

The work done to try to understand the behavior of the human 
brain has led to represent it by a set of structural components 
called neurons, massively interconnected. The human brain 
contains hundreds of billions, and each of them would be, on 
average, connected to ten thousand. The brain is able to organize 
these neurons in a complex assembly, non-linear and highly 
parallel, so as to accomplish sophisticated tasks. For example, 
anyone is able to recognize faces, while this task is almost 
impossible for a classical computer. It is the attempt to give the 
computer the perceptual qualities of the human brain which 
leads to electrical modeling thereof. It is this model that is trying 
to achieve artificial neural networks. Hay kin in the following 
definition: 

“A neural network is a distributed process of massively parallel 
manner, which has a natural propensity for storing 
experimentally knowledge and make it available for use. It 
resembles the brain in two points: 

Knowledge is acquired through a learning process. 

The weights of the connections between neurons are 
used for storing the knowledge ". 


In this proposed approach, we chose to combine the fingerprint 
with the face and the voice with the face. 
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Figure 7 : The proposed multimodal architecture based on ANN 


B. Fusion with SVM 

The main idea of the classification approach is to construct a 
feature vector using the matching score outputs by the separate 
matchers. This feature vector is classified into one of two 
classes: “Accept” (genuine user) or “Reject” (impostor). In 
general, the classifier utilized for this aim is able of acquiring 
knowledge of the decision frontier without regard for how the 
feature vector is constructed. 


The development of artificial neural networks, is based on this 
definition rests. 


Overview of Support Vector Machine (SVM) 

In 1992, Boser, Guyon, and Vapnik introduced Support Vector 
Machine (SVM) which became rather popular since SVM are a 
set of related supervised learning methods used for classification 
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and regression [21]. They appertain to a family of generalized 
linear classifiers. 

Vapnik have developed the foundations of Support Vector 
Machines (SVM) [19] which have been gained popularity due to 
many promising features such as better empirical performance. 
The formulation utilizes the Structural Risk Minimization 
(SRM) principle, which has been shown to be upper, to 
traditional Empirical Risk Minimization (ERM) principle, 
utilized by conventional neural networks. SRM minimizes a 
superior bound on the expected risk, where as ERM keep down 
the error on the training data. 

In biometrics, Support Vector Machine has been utilized for 
different learning based operations such as face recognition and 
multimodal fusion. 

SVM is therefore a classifier that executes classification by 
building hyper planes in a multidimensional space and 
separating the data points into different classes. 

Linearly separable data 

Let {xi, yi} be a set of N data vectors with xi E Rn, yi E {+1, 
-1}, and i = 1, . . .,N. xi is the ith data vector that belongs to a 
binary class yi. 
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Figure8: Linear separation hyper plane for linearly separable 
data. 

The points satisfying equality (2) belong to a hyper plane HI : 

w x x + b = +1 (5) 

Similarly, the checking point equal (3), belong to the hyper plane 
H2: 

w x x + b = -1 (6) 


A binary classifier should find a function f that maps the points 
from their data space to their label space 
f :Rn-> {+1,-1} 
xi->yi 


The distance d (w, b; x) of a point x from the hyperplane (w, b) 
is, 


d(w, b ; x) 


| (wXxi +b) | 

INI 


(7) 


For the benefit of simplicity, we suppose that the data space is 
R2 and that a hyperplane separates the data. There are in fact an 
infinite number of hyperplanes that could divide the data into 
two classes. In accordance with the SRM principle, SVM utilizes 
an iterative training algorithm which maximizes the margin 
between two classes to construct just one optimal hyperplane. 

Assuming that we have a hyper plane separating the positive 
data and negative data, xi belongs to the hyper plane which 
satisfies the relationship: 

w .xi+b= 0 (1) 

In this equation w is the normal to the hyper plane and it is also a 
vector, b is the parameter of the hyper plane. 

For mathematical calculations we have, 

w x xi + b = +l,yi= +1 (2) 


Optimal hyper plane was constructed which the distance to the 
nearest points (margin) is Max. Maximize margin amounts to 
minimizing 2 / llwll. For this, the problem is reformulated as 
Lagrangian. There are two reasons for this; the first is that the 
constraint (4) will be replaced by a constraint on the Lagrangian 
multipliers which will be easier to treat. In addition, in this 
reformulation of the problem, only data learning appear as a dot 
product.Thus, it introduces positive multipliers ai " i = 1... 1 in 
(4). Constraints in equation (4) are multiplied by ai and the 
equation becomes: 

L ( w,b,a ) = i || w || 2 - Y.\=\ yi [yi (w X xi + b) - lj.a, 
>= 0 (8) 

L is called the Lagrangian primal. 

It must minimize the Lagrangian with respect to w and 
simultaneously require its derivatives with compared to all of the 
Lagrangian multipliers ai disappears. By imposing that gradients 
of L with respect to w and b disappear and it obtained: 


w x xi + b = - 1, yi= - 1 


(3) 


L' = £i-=i ai - yiyjoiojxixj 


(9) 


These equations can be combined in the following inequality: ^ j s ca p e( j Lagrangian dual 


yi ( w x xi + b) 3 1 (4) 

The following figure shows the linearly separable case we have 
treated above: 


The points, which ai are strictly greater than 0, are called support 
vectors and they belong to one of the hyperplanes HI or H2. 
These points are closest to the border decision and they form the 
separator plan. 
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T o linearly separable data 

If no hyperplane can be found to separate the data, a nonlinear 
mapping function is then needed. To overcome the 
disadvantages of non-linearly separable case, the idea of SVM is 
to change the data space. The data will be mapped nonlinearly in 
a high-dimensional space and the optimal hyper plane is 
computed in the high-dimensional space. The nonlinear 
transformation of data can allow linear separation examples in a 
new space. So we will have a change in dimension. This new 
dimension is called "re-description of space." Indeed, intuitively, 
the more the size of the redescription space, the greater the 
probability to find a separating hyper plane between examples is 
high. This is illustrated by the following scheme: 
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Figure9: Non linearly separable data. 


Where examples are not linearly separable, the constraints (2) 
and (3) are released by introducing slack variables xi ^0 " i = 

1... 1 which become: 


w x xi + b = +1 - xi, yi= +1 

(10) 

w x xi + b = - 1 + xi, yi= - 1 

(11) 


Therefore there is a transformation of a nonlinear problem of 
separation in the space of representation to a linear separation 
problem in an area of re-description of largest dimension. This 
non-linear transformation is performed using a specific kernel 
function. 

Upon receiving the three match scores from the participating 
individual biometric modules, the fusion phase creates an 
attribute vector out of these individual scores, and applies the 
learned SVM that best corresponds to the incoming data. 

IV. EXPERIMENTAL RESULTS 

The experimental results that we present are divided into two 
parts. We will summarize first the results obtained for each 
unimodal recognition system (fingerprint, face, and voice) based 
on artificial neural networks (ANN). Secondly, we will present 
the results of the proposed biometric multimodal fusion system 
used with three ANN classifiers. 
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First, to evaluate the performance of a biometric system, four 
main criteria must be clearly defined: 


The recognition rate which is calculated as follows: 

Number of recognized persons 


Recognition rate = 


Number of people in the test base 


( 12 ) 


This metric is rather used to evaluate biometric identification 
systems. The system is tested by different images from those 
used for learning. 

- The "False Reject Rate" (or FRR): This rate represents the 
percentage of people expected to be recognized but which are 
rejected by the system. 


FRR = 


Number of false rejection 
Total number of authentic 


(13) 


- The "False Accept Rate" (or FAR): This rate represents the 
percentage of people not expected to be recognized but they are 
still accepted by the system, 


FAR = 


Number of false acceptances 
Total number of impostors 


(14) 


- The "Equal Error Rate" (or EER): This rate is calculated from 
the first two criteria and is a point of current measurement 
performance. This is where FAR is equal to FRR, that is to say, 
the best compromise between false rejection and false 
acceptance. 


EER = 


Number of false rejection +Number of false acceptances 
Total acces 


(15) 



Figure 10: A graphical representation of the FRR, FAR errors, 
the optimal threshold and the EER. [34] 

Figure8 shows an example of such representation. The threshold 
variation along the x-axis gives different values of FAR and 
FRR. It may be noted that in areas where the threshold is low the 
False Acceptance Rate is high and that the False Rejection Rate 
is low and conversely in areas where the threshold value is large. 
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A large FAR means that an impostor has a great tendency to be 
accepted as a client. While a large FRR means that a client has a 
great tendency to be rejected. Each threshold value is a particular 
value of FAR and FRR. In fact, the error rate FAR and FRR vary 
in an inverse manner with respect to the threshold. 

Both FAR and FRR rates are a function of the threshold T. A 
curve can be traced to present variations of the FRR according to 
the FAR. Figure 8 shows an example of this curve called ROC 
curve 'Receiver Operation Characteristic'. It is impossible to 
minimize both FAR and FRR rates simultaneously. The best 
system is one that has the lowest EER. Indeed, if the value of the 
EER is low, then the FAR and FRR values are also and the 
therefore system commits few mistakes. 



Figure 11: ROC curves. [35] 

To evaluate the performance of our proposed multimodal 
authentication system, a database containing face, voice and 
fingerprint samples is required. In this work, we construct a 
multimodal biometric database for our experiments by using 
ORE (Olivetti Research Fab) face database which consists of 400 
frontal faces from 40 subjects (10 images of each subject), a 
restriction of TIMIT (Texas Instruments & Massachusetts 
Institute of Technology) database for the voice to 40 classes only, 
and four different fingerprint databases (DB1, DB2, DB3 and 
DB4) which were collected by using the following 
sensors/technologies : 

• DB 1 : optical sensor "TouchView II" by Identix 

• DB2: optical sensor "FX2000" by Biometrika 

• DB3: capacitive sensor "100 SC" by Precise Biometrics 

• DB4: synthetic fingerprint generation. 

A. Experiments results for the proposed architecture with ANN 

Table 1 summarizes the performance of the ANN fusion of the 
used biometric modalities. 

Table 1 : Performance of the modalities fusion with ANN 
In identification mode 
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Number of epochs 

Fusion 

of 

modaliti 

es 

HN 

1000 

5000 

10000 

Voice/Fa 

ce 

5 

18.39 

% 

20.11 

% 

22.14 

% 

10 

21.55 

% 

28.7 % 

31.55 

% 

50 

34.60 

% 

40.37 

% 

43.75 

% 

100 

44.9 % 

48.03 

% 

56.40 

% 

Fingerpri 

nt/Face 

5 

15.03 

% 

21.69 

% 

27.15 

% 

10 

23.90 

% 

28.00% 

34.29 

% 

50 

35.85 

% 

37.12 

% 

43.65 

% 

100 

42.01 

% 

45.2 % 

54% 

Fingerpri 
nt/Face / 
Voice 

5 

11.00 

% 

18.50 

% 

27.87 

% 

10 

22.30 

% 

28.32% 

35.61 

% 

50 

35.85 

% 

41.64 

% 

48.30 

% 

100 

43.97 

% 

48.2 % 

57.69 

% 


An efficient and robust identification system is a priority task. 
From the Table 2, we notice that the Fingerprint/Face /Voice 
system provides better performance. As our system also allows 
authentication the recognition rate is not enough to evaluate its 
performance. Thus, the following table summarizes the values of 
the three other performance criteria mentioned above. 

Table 2: Performance evaluation of fusion system using three 
ANN in Authentication mode 



FRR 

FAR 

EER 

Fusion based on 
ANN 

1,54 

% 

4,59 % 

4,15% 


With: FRR is False Rejection Rate; FAR is Acceptance Refuse 
Rate and EER is Equal Error Rate. 

B. Experiments results for the proposed architecture with SVM 

Table 3: Performance of the modalities fusion with SVM 
in Identification mode 



Recognition 

Rate 

Fusion using 

linear kernel 

58,72 % 

Fusion using 

polynomial kernel 

63,31 % 
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Table 4: Performance evaluation of fusion system based on SVM 
in Authentication mode 



FRR 

FAR 

EER 

Fusion using 

linear kernel 

1,63 

% 

4,71 

% 

2,82% 

Fusion using 

polynomial kernel 

oo 

3 # 

4,52 

% 

2,35% 


A comparison of support vector machine and ANN based on 
score-level fusion methods can be concluded. 

The experimental results showed a significant improvement of 
SVMs compared to ANNs. This is due to what they can suffer 
multiple local minima. The solution to an SVM is global and 
unique. Two other advantages of SVMs are that it has a simple 
geometric interpretation and give a sparse solution. 

From experiment results we obtain the following conclusions: 

■ The verification accuracy is more improved than single 
biometrics by using fusion of three differents 
biometrics. 

■ By comparing the results of SVM using a linear kernel 
with those using a non-linear kernel, we note an 
advantage of non-linear kernels. This is due that 
convexity is an interesting and important property of 
nonlinear SVM classifiers 

■ A better fusion effect can be achieved by the SVM - 
based fusion rule comparing with SVM score level 
fusion. 

■ This method has the superiority over the previous 
methods due to the application of the new recognition 
algorithms and the SVM -based fusion rule. 

■ Unlike SVMs computational complexity, ANNs is 
proportional to the dimensionality of the input space. 
ANNs empirical use of risk minimization, while SVMs 
using structural risk minimization. Why SVMs 
outperform ANNs often in practice is that they deal 
with the biggest problem with ANNs, SVMs are less 
prone to overfitting. 

V. Conclusion 

In this paper, we introduced the concepts of recognition 
unimodal and multimodal biometrics. The principle is to design 
unimodal recognition systems and combine their scores from 
different biometric modalities to increase the power of 
identification. 

This work provides new contribution to the field of biometrics 
multimodal. In fact, it shows the authentication of individuals by 
multimodal fusion based on ANN or SVM using the fingerprint, 
face and speech recognition. Among the various levels of 
existing fusion, we have chosen to work for the score level. It 
offers the best compromise between the wealth of information 
and the ease of implementation. 

The errors come from the imperfection of one biometric have 
been remedied by the fusion process by ensuring better 
recognition rate. 

In addition, we detailed the concept of classification by neural 
network and support vector machines for multimodal fusion. 
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In score-level fusion, SVM provides better performance as 
compare to the ANN. 

Future work will investigate on better alternative recognition 
technique suitable for fusion of fingerprint, speech and face. 

We think that the performance of multi-biometric systems can be 
improved if a suitable fusion strategy is used in particular for the 
system running in an uncontrolled environment. Therefore, it 
would be interesting to apply other approach of fusion and to 
compare its results with those obtained by the ANN and SVM to 
maximize the performance of multi -biometric system. 
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